Estimating the Quality of Translated User-Generated Content
نویسندگان
چکیده
Previous research on quality estimation for machine translation has demonstrated the possibility of predicting the translation quality of well-formed data. We present a first study on estimating the translation quality of user-generated content. Our dataset contains English technical forum comments which were translated into French by three automatic systems. These translations were rated in terms of both comprehensibility and fidelity by human annotators. Our experiments show that tried-and-tested quality estimation features work well on this type of data but that extending this set can be beneficial. We also show that the performance of particular types of features depends on the type of system used to produce the translation.
منابع مشابه
Evaluation of Machine-Translated User Generated Content: A pilot study based on User Ratings
This paper presents the results of an experimental pilot user study, focusing on the evaluation of machine-translated user-generated content by users of an online community forum and how those users interact with the MT content that is presented to them. Preliminary results show that ratings are very difficult to obtain, that a low percentage of posts (21%) was rated, that users need to be well...
متن کاملSemantic Tagging and Inference in Online Communities
In this paper we present UsTag, an approach for providing user defined semantics for user generated content (UGC) and process those semantics with user defined rules. User semantics is provided with a tagging mechanism extended in order to express relationships within the content. These relationships are translated to RDF triples. RDF triples along with user defined rules enable the creation of...
متن کاملExploring the Popularity, Reputation and Certification of User-Generated Software
User-Generated Content has reshaped the landscape of the Information Marketplace during the last years. Among the content, software is a very impacting class. In comparison to regular resource, the estimation of popularity, reputation and certification of user-generated software in distributed architecture appears to be a challenging task. Furthermore how to use popularity to help user to disco...
متن کاملMaintaining Sentiment Polarity in Translation of User-Generated Content
The advent of social media has shaken the very foundations of how we share information, with Twitter, Facebook, and Linkedin among many well-known social networking platforms that facilitate information generation and distribution. However, the maximum 140-character restriction in Twitter encourages users to (sometimes deliberately) write somewhat informally in most cases. As a result, machine ...
متن کاملAssessment of uncertainty for coal quality-tonnage curves through minimum spatial cross-correlation simulation
Coal quality-tonnage curves are helpful tools in optimum mine planning and can be estimated using geostatistical simulation methods. In the presence of spatially cross-correlated variables, traditional co-simulation methods are impractical and time consuming. This paper investigates a factor simulation approach based on minimization of spatial cross-correlations with the objective of modeling s...
متن کامل